{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "## 时间序列预测案例一: 股票预测\n",
    "> 使用上证指数的收盘价。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "1. 首先用 tushare 下载上证指数的K线数据，然后作标准化处理。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import tushare as ts\n",
    "data_close = ts.get_k_data('000001', start='2018-01-01', index=True)['close'].values  # 获取上证指数从20180101开始的收盘价的np.ndarray\n",
    "data_close = data_close.astype('float32')  # 转换数据类型\n",
    "# 将价格标准化到0~1\n",
    "max_value = np.max(data_close)\n",
    "min_value = np.min(data_close)\n",
    "data_close = (data_close - min_value) / (max_value - min_value)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "原始数据：上证指数从2018-01-01到2019-05-24的收盘价（未标准化处理）\n",
    "\n",
    "把K线数据进行分割，每 DAYS_FOR_TRAIN 个收盘价对应 1 个未来的收盘价。例如K线为 [1,2,3,4,5]， DAYS_FOR_TRAIN=3，那么将会生成2组数据：\n",
    "\n",
    "第1组的输入是 [1,2,3]，对应输出 4；\n",
    "\n",
    "第2组的输入是 [2,3,4]，对应输出 5。\n",
    "\n",
    "然后只使用前70%的数据用于训练，剩下的不用，用来与实际数据进行对比。"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "DAYS_FOR_TRAIN = 10\n",
    "\n",
    "def create_dataset(data, days_for_train=5) -> (np.array, np.array):\n",
    "    \"\"\"\n",
    "        根据给定的序列data，生成数据集\n",
    "\n",
    "        数据集分为输入和输出，每一个输入的长度为days_for_train，每一个输出的长度为1。\n",
    "        也就是说用days_for_train天的数据，对应下一天的数据。\n",
    "\n",
    "        若给定序列的长度为d，将输出长度为(d-days_for_train+1)个输入/输出对\n",
    "    \"\"\"\n",
    "    dataset_x, dataset_y= [], []\n",
    "    for i in range(len(data)-days_for_train):\n",
    "        _x = data[i:(i+days_for_train)]\n",
    "        dataset_x.append(_x)\n",
    "        dataset_y.append(data[i+days_for_train])\n",
    "    return (np.array(dataset_x), np.array(dataset_y))\n",
    "\n",
    "dataset_x, dataset_y = create_dataset(data_close, DAYS_FOR_TRAIN)\n",
    "\n",
    "# 划分训练集和测试集，70%作为训练集\n",
    "train_size = int(len(dataset_x) * 0.7)\n",
    "\n",
    "train_x = dataset_x[:train_size]\n",
    "train_y = dataset_y[:train_size]\n",
    "\n",
    "# 将数据改变形状，RNN 读入的数据维度是 (seq_size, batch_size, feature_size)\n",
    "train_x = train_x.reshape(-1, 1, DAYS_FOR_TRAIN)\n",
    "train_y = train_y.reshape(-1, 1, 1)\n",
    "\n",
    "# 转为pytorch的tensor对象\n",
    "train_x = torch.from_numpy(train_x)\n",
    "train_y = torch.from_numpy(train_y)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 定义网络、优化器、loss函数"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "source": [
    "import torch\n",
    "from torch import nn\n",
    "class LSTM_Regression(nn.Module):\n",
    "    \"\"\"\n",
    "        使用LSTM进行回归\n",
    "\n",
    "        参数：\n",
    "        - input_size: feature size\n",
    "        - hidden_size: number of hidden units\n",
    "        - output_size: number of output\n",
    "        - num_layers: layers of LSTM to stack\n",
    "    \"\"\"\n",
    "    def __init__(self, input_size, hidden_size, output_size=1, num_layers=2):\n",
    "        super().__init__()\n",
    "        self.lstm = nn.LSTM(input_size, hidden_size, num_layers)\n",
    "        self.fc = nn.Linear(hidden_size, output_size)\n",
    "    def forward(self, _x):\n",
    "        x, _ = self.lstm(_x)  # _x is input, size (seq_len, batch, input_size)\n",
    "        s, b, h = x.shape  # x is output, size (seq_len, batch, hidden_size)\n",
    "        x = x.view(s*b, h)\n",
    "        x = self.fc(x)\n",
    "        x = x.view(s, b, -1)  # 把形状改回来\n",
    "        return x\n",
    "model = LSTM_Regression(DAYS_FOR_TRAIN, 8, output_size=1, num_layers=2)\n",
    "loss_function = nn.MSELoss()\n",
    "optimizer = torch.optim.Adam(model.parameters(), lr=1e-2)\n"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 训练"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "source": [
    "for i in range(1000):\n",
    "    out = model(train_x)\n",
    "    loss = loss_function(out, train_y)\n",
    "    loss.backward()\n",
    "    optimizer.step()\n",
    "    optimizer.zero_grad()\n",
    "    if (i+1) % 100 == 0:\n",
    "        print('Epoch: {}, Loss:{:.5f}'.format(i+1, loss.item()))"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 测试"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "source": [
    "import matplotlib.pyplot as plt\n",
    "model = model.eval() # 转换成测试模式\n",
    "# 注意这里用的是全集 模型的输出长度会比原数据少DAYS_FOR_TRAIN 填充使长度相等再作图\n",
    "dataset_x = dataset_x.reshape(-1, 1, DAYS_FOR_TRAIN)  # (seq_size, batch_size, feature_size)\n",
    "dataset_x = torch.from_numpy(dataset_x)\n",
    "pred_test = model(dataset_x) # 全量训练集的模型输出 (seq_size, batch_size, output_size)\n",
    "pred_test = pred_test.view(-1).data.numpy()\n",
    "pred_test = np.concatenate((np.zeros(DAYS_FOR_TRAIN), pred_test))  # 填充0 使长度相同\n",
    "assert len(pred_test) == len(data_close)\n",
    "plt.plot(pred_test, 'r', label='prediction')\n",
    "plt.plot(data_close, 'b', label='real')\n",
    "plt.plot((train_size, train_size), (0, 1), 'g--')\n",
    "plt.legend(loc='best')\n",
    "plt.show()"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "execution_count": null,
   "outputs": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}